Why was this show chosen? Due to all artists being established, they have a fanbase, which could play into the results. On survival shows where aspiring trainees are trying to debut, many trainees are unknown which means that they don’t have any influence from before the show starting.
The format of the show also makes it better than other shows for analysis, due to the outcomes of the contestants being evenly divided, with 7 people eliminated or debuting during each phase. The question this analysis asks how does the various factors of a group/artist affect their placement on Queendom Puzzle?
library(ggplot2)
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
## ✔ broom 1.0.5 ✔ rsample 1.2.0
## ✔ dials 1.2.0 ✔ tibble 3.2.1
## ✔ infer 1.0.5 ✔ tidyr 1.3.0
## ✔ modeldata 1.2.0 ✔ tune 1.1.2
## ✔ parsnip 1.1.1 ✔ workflows 1.1.3
## ✔ purrr 1.0.2 ✔ workflowsets 1.0.1
## ✔ recipes 1.0.9 ✔ yardstick 1.2.0
## Warning: package 'recipes' was built under R version 4.3.0
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ recipes::step() masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
QP_data <- read_excel("QP-Ranking.xlsx")
QP_Ranking <- as_tibble(QP_data)
head(QP_Ranking)
## # A tibble: 6 × 12
## episode initial_rank name ranking signal_song initial_group remix_song
## <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 1 4 Nana 22 PICK on the top Woo!ah! Nxde
## 2 1 1 Yeoreum 1 PICK-CAT WJSN Don't Call…
## 3 1 4 Hwiseo 22 Athena H1-KEY Shut Down
## 4 1 1 Kei 1 Athena Lovelyz Only One
## 5 1 2 Yeeun 8 PICK-CAT CLC Don't Call…
## 6 1 2 Jihan 8 PICK on the top Weeekly Nxde
## # ℹ 5 more variables: all_rounder_team <chr>, vocal_rap_song <chr>,
## # dance_song <chr>, semi_final_song <chr>, finale_song <chr>
QP_Ranking_EL7ZUP <- QP_Ranking %>%
filter(name == "Hwiseo" | name == "Nana" | name == "Yuki" | name == "Kei" | name == "Yeoreum" | name == "Yeonhee" | name == "Yeeun")
gfg_plot <- ggplot(QP_Ranking_EL7ZUP, aes(x=episode, y=ranking, group=name, color=name)) +
geom_line() +
geom_point() +
xlab("Episode Number") +
ylab("Rank") +
scale_x_continuous(breaks=1:10) +
scale_y_continuous(trans="reverse", breaks=1:28) +
facet_wrap(~name) +
scale_colour_discrete(name="Name")
gfg_plot
gfg_plot2 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=name)) +
geom_line() +
geom_point() +
xlab("Episode Number") +
ylab("Rank") +
scale_x_continuous(breaks=1:10) +
scale_y_continuous(trans="reverse", breaks=1:28) +
facet_wrap(~name) +
scale_colour_discrete(name="Name")
gfg_plot2
gfg_plot3 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=signal_song)) +
geom_line() +
geom_point() +
xlab("Episode Number") +
ylab("Rank") +
scale_x_continuous(breaks=1:10) +
scale_y_continuous(trans="reverse", breaks=1:28) +
facet_wrap(~signal_song) +
scale_colour_discrete(name="Signal Song")
gfg_plot3
gfg_plot4 <- ggplot(QP_Ranking, aes(x=episode, y=ranking, group=name, color=initial_group)) +
geom_line() +
geom_point() +
xlab("Episode Number") +
ylab("Rank") +
scale_x_continuous(breaks=1:10) +
scale_y_continuous(trans="reverse", breaks=1:28) +
facet_wrap(~initial_group) +
scale_colour_discrete(name="Affiliation")
gfg_plot4
QP_Ranking_Remix <- filter(QP_Ranking, remix_song != 'NA')
gfg_plot5 <- ggplot(QP_Ranking_Remix, aes(x=episode, y=ranking, group=name, color=remix_song)) +
geom_line() +
geom_point() +
xlab("Episode Number") +
ylab("Rank") +
scale_x_continuous(breaks=1:10) +
scale_y_continuous(trans="reverse", breaks=1:28) +
facet_wrap(~remix_song) +
scale_colour_discrete(name="Remix")
gfg_plot5
To analyze popularity, I will use the ranks given on the first episode of the show that were based on their prior career experience in the industry. This is a fair metric of popularity to compare, except for the fact that Fye and Miru are ranked low as they have no K-pop experience, despite having a fanbase outside of it which could influence the votes.
summarized_data <- QP_Ranking %>%
group_by(name)
summarized_split <- initial_split(summarized_data, prop = 0.75, strata = initial_rank)
summarized_training <- training(summarized_split)
summarized_testing <- testing(summarized_split)